Genetic markers in Andean Puya species (Bromeliaceae) with implications on plastome evolution and phylogeny

Abstract The Andean plant endemic Puya is a striking example of recent and rapid diversification from central Chile to the northern Andes, tracking mountain uplift. This study generated 12 complete plastomes representing nine Puya species and compared them to five published plastomes for their features, genomic evolution, and phylogeny. The total size of the Puya plastomes ranged from 159,542 to 159,839 bp with 37.3%–37.4% GC content. The Puya plastomes were highly conserved in organization and structure with a typical quadripartite genome structure. Each of the 17 consensus plastomes harbored 133 genes, including 87 protein‐coding genes, 38 tRNA (transfer RNA) genes, and eight rRNA (ribosomal RNA) genes; we found 69–78 tandem repeats, 45–60 SSRs (simple sequence repeats), and 8–22 repeat structures among 13 species. Four protein‐coding genes were identified under positive site‐specific selection in Puya. The complete plastomes and hypervariable regions collectively provided pronounced species discrimination in Puya and a practical tool for future phylogenetic studies. The reconstructed phylogeny and estimated divergence time for the lineage suggest that the diversification of Puya is related to Andean orogeny and Pleistocene climatic oscillations. This study provides plastome resources for species delimitation and novel phylogenetic and biogeographic studies.

The phylogenetic relationships of Andean Puya have not been fully resolved. The first molecular phylogenetic study of Puya, based on three plastid sequences and one single-copy nuclear DNA sequence, supported the monophyly of Puya, dividing the genus into two major clades, "Core Puya" and "Chilean Puya" (Jabaily & Sytsma, 2010). However, the level of informative sequence variation for the "Core Puya" clade was too low to resolve species relationships. The same authors (Jabaily & Sytsma, 2013) later analyzed 885 amplified fragment length polymorphism (AFLP) sequences for 75 taxa and the results clustered Puya species into two major well-supported clades corresponding to "Core Puya" and "Blue Puya" (Chilean species with blue flowers); however, species relationships were also weakly supported.
Genome skimming is a straightforward next-generation sequencing approach to obtain genetic sequences of the complete plastome for plant species (Dodsworth, 2015). In recent years, plastomes have become a versatile tool for significantly increasing resolution at low taxonomic levels in plant phylogenetic and phylogeographic analyses (Li et al., 2015;Parks et al., 2009). To test the efficacy of plastome sequences for resolving phylogenetic relationships within Puya, we generated 12 new Puya plastomes representing nine species and compared them to five published Puya accessions. Here, we address the following objectives: (1) identify particular features, structures, repeat regions, and positive selection sites of plastomes; (2) assess the utility of plastomes for phylogenetic reconstruction within Puya; (3) screen potential DNA barcodes for species discrimination; and (4) estimate divergence times of major lineages of Puya.
Total genomic DNA was extracted from silica-dried leaves using the CTAB (cetyltrimethylammonium bromide) method (Doyle & Doyle, 1987). The library construction and sequencing were executed by the Beijing Genomics Institute (BGI, Wuhan, China). Covaris was used to randomly fragment 1 μg of genomic DNA, followed by fragment selection with an average size of 200-400 bp, PCR amplification, and purification using the Agencourt AMPure XP-Medium kit. Single-strand circular DNAs were generated as the final library.

| Repeat analyses
A tandem repeat in DNA is composed of two or more adjacent, approximate copies of a pattern of nucleotides. Tandem Repeats Finder v4.09 (Benson, 1999) was used to detect tandem repeats, setting the minimum alignment score to 80. Simple sequence repeats (SSRs) were examined by Perl script MicroSAtellite (MISA) (https://webbl ast.ipk-gater sleben.de/misa/; Beier et al., 2017) with the following parameters: minimum SSR motif length of 10 bp and repeat length of mono-10, di-6, tri-5, tetra-5, penta-5, and hexa-5. The maximum size of interruptions allowed between two different SSRs in a compound SSR was 100 bp. We also identified various types of dispersed repeats using REPuter (https://bibis erv.cebit ec.uni-biele feld.de/reputer; Kurtz et al., 2021), setting the minimum repeat size of 30 bp, the maximum size of 100 bp, and hamming distance of 3. Repeats were classified into the following groups: (1) forward or direct repeats (F); (2) repeats found in reverse orientation (R); (3) palindromic repeats forming hairpin loops in their structure (P); and (4) repeats found in reverse complement orientation (C). Because REPuter overestimates the number of repeats, those with less than 30 bp in length and a redundant output were manually filtered.

| Variation analyses
The plastome sequences were aligned using the default option implemented in MAFFT v7.467 (Katoh & Standley, 2013   to detect signatures of positive selection on each gene. The likelihood ratio test (LRT) and the Bayes Empirical Bayes (BEB) method were used to identify sites under positive selection. Gene sites with a dN/dS > 1, p < .05 and posterior probability >.95 were considered as putatively selected.

| Phylogenetic analyses
To evaluate the efficacy of plastome data for phylogenetic tree reconstruction, we assembled datasets that included sequences from 13 Puya species, 12 plastome assemblies generated here, plus five

| Divergence time estimation
Based on the phylogenetic analyses of the plastomes of 13 Puya species, divergence time estimations were inferred using BEAST2 (Bouckaert et al., 2019). To avoid biased results by using a single species as an outgroup, we included Ananas comosus (NC_026220.  (Drummond et al., 2006), and a Yule process tree prior (Yule, 1925). The MCMC chains (Drummond et al., 2002) were conducted with chain lengths set to 100,000,000 and sampling frequencies to 10,000. The convergence of output log files was analyzed using Tracer v1.5, and runs were continued until the effective sample sizes were >200.

| Plastome features
Overall, 20,000,000-25,268,314 PE 150 bp reads were obtained from 12 newly generated accessions ( The plastome sequences of 17 Puya accessions were aligned and annotated using reference-based (P. mirabilis) and de novo method. The results showed that the plastomes were highly conserved within Puya. Also, no rearrangement events were detected among the Puya species. All plastomes encode 133 genes, including 87 protein-coding genes, 38 tRNA (transfer RNA) genes, and eight rRNA (ribosomal RNA) genes, with identical gene order ( Figure 2;    Table 1).
The most and the least prevalent amino acids coded were leucine (10.3%) and cysteine (1.2%), respectively (Appendix S3). Meanwhile, the third codon used for every amino acid was biased towards A and T in the Puya plastomes (Figure 4), corresponding to many other land plants (Duan et al., 2021;Liu et al., 2020;Ravi et al., 2007).

| Plastome repeats
We identified 69-78 tandem repeats, ranging from 3 to 59 bp (Appendix S4). The most common size was 10-19 bp, primarily located in the LSC and SSC regions. We also identified 45-60 SSRs, all of which are in the LSC and SSC regions (Appendix S5); 39-57 SSRs are mononucleotides, and 2-6 are dinucleotides. In addition, one SSR is tri-nucleotide (P. ferruginea), and one is tetra-nucleotide (P. goudotiana and P. santosii). We detected 4-10 compound SSRs in the Puya plastomes with a maximum interruption of 100 bp. The most abundant motif was poly-A/T in a proportion of 86%-94%.
The 16-17 large repeats were recorded in most Puya plastomes: 4-5 forward, 0-1 reverse, 11-13 palindromic, and 0-1 complement repeats (Appendix S6). Most dispersed repeats are distributed in the intergenic spacer regions of the genome, located mainly in the LSC region. The most variable region was the intergenic spacer trnS GCU -trnG GCC region, including forward, palindromic, and reverse complement repeats.

| Genomic diversity and selection pressure of Puya
The sliding window analyses using DnaSP software revealed highly variable regions in the plastome of Puya ( Figure 5 io/view/intro ducti on-to-calcu latin g-dn-ds-ratio s-with-code-qhwdt 7e?step=4). Since positive selection was unlikely to affect all sites of one gene over a prolonged time, we targeted our detection to some specific sites under positive selection using site models, which allowed the ratio of dN/dS to vary among codons. We found seven sites spanning four genes (rbcL, rpoC1, rpl20, and ycf1) exhibiting site-specific selection (Table 3). Of these genes, the rbcL gene harbored two sites under positive selection, with three in rpoC1, two in rpl20, and one in ycf1 (Table 3).

| Phylogenetic analyses
The BI and ML analyses revealed the phylogenetic relationships among Puya species (Figure 6). The topology of the BI tree is almost congruent with the ML tree in each of the three datasets. Based on the complete plastomes dataset, the tree is consistent with the tree generated from the hypervariable regions but differs from that based on the standard barcodes. The highest support values F I G U R E 2 Gene map of the complete plastome of Puya. The figure shows a circular representation of the Puya plastome with structural organization of the gene content ring which was color coded based on its functional category. The dark gray lines in the innermost circle denotes the GC content across the genome. The genes that were transcribed counter-clockwise and clockwise were at the outer and inner ring, respectively.
(posterior probabilities/PP = 1, bootstrap support/BS ≥98% mostly) were obtained when using the complete plastome, with slightly weaker support in the hypervariable regions (PP >0.97, BS ≥87% mostly), and most relationships were not fully resolved in the standard barcodes. Therefore, the relationships among species described below are based on the complete plastome dataset.
Interspecific relationships among the species examined were strongly supported for the most part (PP = 1, BS ≥98%, Figure 6a

| Analyses of potential barcodes
We evaluated the efficacy of the three datasets (the complete plastomes, the combination of 12 hypervariable regions, and three standard barcodes) to discern Puya species. Based on the phylogenetic trees, species identification was considered successful (100% of the species resolved) only if all conspecific individuals formed a monophyletic clade with more than 50% BS support. For the hypervariable dataset (BS ≥80%) and the complete plastomes dataset (BS ≥65%), the four species with two accessions (i.e., P. hutchisonii, P. macropoda, P. macrura, and P. raimondii) were correctly identified to species (Figure 6a,b). However, the tree based on the standard barcodes dataset (rbcL + matK + trnH-psbA) failed to distinguish all these species (Figure 6c). Thus, we concluded that the hypervariable regions and the plastomes of Puya were potential barcodes for assigning species in Puya. If the hypervariable regions provide insufficient resolution in some circumstances, plastomes may provide more characters. However, the three standard barcodes showed to be poor candidates as DNA barcodes for Puya.

| Divergence time estimation
Based on the complete plastomes dataset, the divergence time was estimated for the 13 species of Puya, representing the significant clades ( Figure 7).

| Plastome evolution in Puya
Puya is characterized by rapid diversifications, leading to the challenge of reconstructing its evolutionary history (Givnish et al., 2011). Among 17 Puya accessions, the nucleotide identity of all CDSs is nearly 99%, showing few differences in this genus.
Thus, the hypervariable regions with relatively high genetic variation may serve as effective markers for elucidating phylogenetic relationships and species discrimination in Puya. In previous studies, the intergenic spacer of the rpoB and psbD gene regions show high divergence in pineapple (Redwan et al., 2015). Compared to the pineapple plastome, two regions of rpoB-psbD in Puya showed high nucleotide diversity, including rpoB-trnC GCA and trnC GCA -petN. We detected 10 additionally highly variable regions, including trnK UUU -rps16, psbK-psbI, psbI-trnS GCU , trnS GCU -trnG GCC , psbC-trnS UGA , trnS and rpl32-trnL UAG . Collectively, we suggest these 12 hypervariable sequences may act as useful markers for phylogenetic studies in

Puya.
Plastid genes are essential to plant metabolism, and genes under negative selection tend to maintain protein functions, while those under positive selection may drive adaptive divergence (Burri et al., 2010;Piot et al., 2018). Although most plastomes are highly conserved in sequence and structure within angiosperms, with dN/dS values less than 1 (Jansen et al., 2007;Wicke et al., 2011), our results showed that four protein-coding genes (rbcL, rpoC1, rpl20, and ycf1) undergo significant site-specific selection pressure (Table 3). Recent studies indicate that positive selection in these four genes may be a widespread phenomenon; rbcL and rpl20 in Bromeliaceae (Hermida-Carrera et al., 2020;Redwan et al., 2015), rpoC1 in Zingiberaceae , and ycf1 in Asteraceae (Kim et al., 2020).
While the ycf1 gene is required for photosynthetic protein import and essential for plant fitness and viability (de Vries et al., 2015), the large subunit of Rubisco encoded by the rbcL gene is the cornerstone of photosynthesis and is responsible for converting inorganic carbon into organic compounds (Spreitzer & Salvucci, 2002). Here, positive selection in the rbcL gene of terrestrial land plants is ubiquitous (Kapralov & Filatov, 2007), reflecting adaptation to environmental changes and the corresponding co-evolution of proteins in the Rubisco complex (Tcherkez et al., 2006). In addition, the rpoC gene encodes the major catalytic subunit of RNA polymerase, which is essential for plastome biogenesis (Liebers et al., 2018), and the rpl20 gene encodes large ribosomal subunits, which is indispensable in ribosome biogenesis, protein synthesis, cell growth, development, and apoptosis Saha et al., 2017). Our finding that all four of these genes are under positive selection may indicate that natural selection targets different plastid functions, supporting the possible involvement of plastid genes in adaptation and speciation processes (Greiner & Bock, 2013).

F I G U R E 5
Nucleotide diversity of the entire plastome of 17 studied Puya accessions. X-axis: Position with the unit of 5000 bp; Y-axis: Nucleotide diversity of each window. Only the regions with the highest nucleotide diversities (pi > .007) were labeled.

| Phylogenomics application
Plastome data are widely used to construct plant phylogenies (Davis et al., 2014;Li et al., 2019) mainly because they can resolve phylogenetic relationships among recent and rapid diverging groups (Zhao et al., 2021). Based on molecular markers, including nuclear and plastid sequences and AFLP data (Givnish et al., 2011;Jabaily & Sytsma, 2010, only two significant clades, "Core Puya" and "Chilean Puya," were recovered with robust support. In this study, three well-supported clades were clustered geographically (Figure 6a), with Clade I distributed in Chile, Clade II ranging from Bolivia to Ecuador, and Clade III in Colombia (except for P. hamata which ranges from Colombia to Peru) corresponding to "Chilean Puya," "Central Andes Puya,", and "Northern Andes Puya" (Jabaily & Sytsma, 2010). This strong spatial signature may indicate that the Puya species in our study might have experienced two geographically delimited and mostly disconnected radiations in Chile and the high Andes, respectively, followed by dispersal events to the Northern Andes. Similar patterns have been found in other Andes species, for example, Espeletiinae (Pouchon et al., 2018). Given the limited sampling regime in our study, we acknowledge that more Puya species need to be collected to verify that the geographic patterns found here also hold for all 229 species in the genus.
Both interspecific and intraspecific relationships among the studied Puya species were well resolved. Our results also corroborated TA B L E 3 Positive selection sites identified in the plastome of Puya. The log likelihoods (Ln L) and the p-value of the likelihood ratio test (LRT) were calculated for every two models. The positive sites with * are significant (.001 < p < .05), those with ** are highly significant (p < .001)
Puya species P. alpestris in Clade I and P. raimondii in Clade II are not as sister species. These results highlight the possible independent evolution of the sterile inflorescence apex in subg. Puya species. It has been previously suggested that the sterile inflorescence apex may act as feeding stations and perches for passerine and hummingbird pollinators (Hornung-Leoni et al., 2013). Thus, this character may not be an appropriate diagnostic of subg. Puya since it seems to have experienced several convergent evolutions events to attract pollinators (Anderson et al., 2005;Hornung-Leoni & Sosa, 2008;Jabaily & Sytsma, 2010;Johow, 1898). A similar trend has been proposed recently for the evolution of the Andean genus Espeletia (Pouchon et al., 2018).
In mesangiospermae, conflicting topologies between the plastome and nuclear trees were widely detected due to their inconsistent genetic patterns (Leebens-Mack et al., 2019;Li et al., 2019;Ma et al., 2021;Yang et al., 2020). Several processes could explain ambiguous and conflicting topologies, such as hybridization, polyploidization, and incomplete lineage sorting (ILS; Degnan & Rosenberg, 2009;Paule et al., 2020;Rieseberg & Soltis, 1991;Ye et al., 2021). Comparing our plastid trees with biparentally inherited nuclear gene trees , incongruences were observed in the placement of P. raimondii and P. macrura. In the plastome phylogeny, P. raimondii is sister to the clade comprising P. hutchisonii, P. macropoda, and P. macrura, and P. macrura is sister to P. macropoda. However, a different topology built with nuclear genes weakly rejected the sister relationships between these species . The sparse sampling has greatly limited our understanding of phylogenetic discordances in this study.
ILS is a random process that should not necessarily lead to the geographic footprint  in the phylogenetic clustering of plastid markers that we demonstrated here. ILS is not likely to explain the cytonuclear discordance observed in Puya, as is the case for high-elevation Andes Espeletia (Pouchon et al., 2018). Polyploidy is not known in Puya and is very rare in Bromeliaceae (Brown & Gilmartin, 1989;Gitaí et al., 2005;Smith & Downs, 1974), suggesting that it does not account for the discordance. Nevertheless, there is a reasonable doubt that polyploidy has not yet been revealed in Puya, offering an incentive for future explorations.
Hybridization is common in plants and has a principal role in the origin of new species (Mallet, 2007;Xu et al., 2021). In the high-elevation Andes, hybridization is also a relatively common occurrence, rendered by "flickering connectivity" and the diversity of habitats, resulting from Pleistocene climate changes and the steep ecological clines along mountain slopes, respectively (Schley et al., 2021). The hybridization hypothesis stands out as the most probable explanation for the biogeographic patterns and the extremely short branch length observed between P. macrura and P. macropoda (Figure 6a). Puya is characterized by bird pollination and winged seeds, providing extensive pollen and seed dispersal and a strong connection among populations (Smith & Downs, 1974). Given that recent hybridization among other Puya species has been verified (Jabaily & Sytsma, 2010, assessing the magnitude of gene flow or hybridization in further phylogenetic studies of Puya, or even in the family Bromeliaceae (Matallana et al., 2016;Wendt et al., 2008), is warranted.

| Screening of potential DNA barcodes
DNA barcoding is a valuable tool for species identification (Hebert et al., 2003). Four plant DNA barcode markers, rbcL, matK, trnH-psbA, and ITS2, have been suggested as the plant standard barcodes (Kress, 2017). However, for DNA barcoding, discriminating among taxa with complex recent radiations remains a difficult challenge (Kress, 2017;Spooner, 2009). Our study showed that the combination of rbcL, matK, and trnH-psbA sequences has relatively low genetic variability and low species resolution in the genus Puya ( Figure 6c). Because of the inherent limitation of standard barcodes and recent decreases in sequencing costs, complete plastome data is a useful tool for the next generation of DNA barcodes that have higher interspecific and lower intraspecific divergence (Barrett et al., 2016;Coissac et al., 2016;Huang et al., 2014;Ji et al., 2019;Knox, 2014;Li et al., 2015;Moore et al., 2010;Ruhsam et al., 2015;Song et al., 2020;Zhang et al., 2021). We found more variable characters in the complete plastomes of Puya, and most interspecific relationships were resolved with robust support (Figures 5 and 6a); four species, with two accessions each, were correctly identified using their complete plastomes. Therefore, we suggest using complete plastomes as a practical "super-barcode" for phylogenetic reconstruction and species identification in the genus Puya.
Taxon-specific barcodes may also enhance species discrimination rates because they typically provide more genetic information within a particular group of species than standard DNA barcodes used across taxa of broad phylogenetic dispersion . Considering this factor, taxon-specific barcodes with sufficiently high mutation rates are now widely used, representing an intermediate trade-off between standard barcode DNA and "super-barcodes" (Li et al., 2015;Li, Gichira, et al., 2021). Our results also show that the resolution of hypervariable regions was similar to that of the complete plastomes (Figure 6a,b), which indicates that the 12 hypervariable regions screened in this study can also be used as DNA barcodes for phylogenetic applications and phylogeographic investigations.

| Divergence of Puya
Unsurprisingly, the divergence time estimates (stem and crown age of Puya) under a secondary calibrated crown node agreed with previous studies (Givnish et al., 2011;Möbus et al., 2021).
Such mountain building promoted the formation of a barrier for Amazonian moisture, intensifying aridity and seasonality on the western slopes (Houston & Hartley, 2003). Thus, the final and recent rise of the Andes since the late Miocene may have driven the formation of suitable habitats for Puya species and segregated the Andean Puya from the Chileans (Jabaily & Sytsma, 2010).
The Pleistocene glacial cycles have also been suggested to promote population expansion and contraction, driving a high speciation rate, especially in mountainous areas (Kadereit & Abbott, 2021). In the high Andes, glacial cycles were primarily associated with vertical and minimal horizontal displacement of vegetation zones, causing a downward shift of 1200-1400 m in the treeline at the Last Glacial Maximum and an upward shift in elevation during interglacial periods (Graham, 2009;Hooghiemstra et al., 2006;Jomelli et al., 2014;Nevado et al., 2018;Simpson, 1974;Van der Hammen & Cleef, 1986). The occurrence of close relatives of Puya at different elevations at the same latitude and frequent transitions between adjacent cordilleras lends support to the proposal that Pleistocene glaciation cycles were responsible for allopatric speciation in this group via a glacial "pump" (Jabaily & Sytsma, 2013). Therefore, we agree with Sytsma (2010, 2013) that the recent uplift of the Andes since the late Miocene and subsequent Pleistocene glaciation cycles may have triggered the origin of the major clades delimited in this study (i.e., Clade I, Clade II, and Clade III) and recently rapid speciation in Puya.

| Limitations of this study
We are aware that our sampling regime is limited to 13 of the 229 species of Puya, and the three main clades found in this study, corresponding to "Chilean Puya," "Central Andes Puya," and "Northern Andes Puya," may not represent the full diversity of taxa in the genus Puya. Thus, the geographic patterns found in our study should be further investigated by an increased sampling, especially for those species distributed on the border of Chile. Moreover, although orogeny and climate change are often considered to be the major driving factors for the diversification of species in the high Andes (Hoorn et al., 2022), and the estimated divergence time in this study is consistent with these occurrences, other biotic factors (e.g., pollinator interactions and competition), likely contribute to the patterns found in this study. Future studies should include quantitative ecological approaches that will help discern additional potential sources of variation driving the diversification of Puya in the Andes.

| CON CLUS IONS
We reported 12 newly generated plastomes and compared them with five published ones, representing 13 Puya species.
Comparative analyses of Puya plastomes, representing the major clades, suggested that the quadripartite structure and identical gene content were conserved. The genes of rbcL, rpoC1, rpl20, and ycf1 might be related to habitat adaptation and plant growth and reproduction under selection pressure. We tested complete plastomes, hypervariable regions, and standard DNA barcodes for phylogenetic recontraction and species discrimination. The phylogenetic tree built by the former two datasets provided stronger discrimination power and support for three major clades: "Chilean Puya," "Central Andes Puya," and "Northern Andes Puya." Twelve hypervariable regions could be used as potential DNA barcodes for the genus Puya. Among these datasets, the complete plastomes greatly improved species resolution in Puya and showed high potential for future studies. Divergence time within Puya shed insight on the rapid radiation of this genus related to Andean orogeny and Pleistocene climate oscillations.

ACK N OWLED G M ENTS
We are grateful to Prof. Asunción Cano for field assistance and spe-

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
Twelve assembled plastomes and their raw sequencing data described in this article are publicly available in the National Center for Biotechnology Information (NCBI) database under project